Manual Unpacking

Uncompressing/Unpacking A Compressed/Packed File Manually

by ytc_ [tNO '99]
5th March, 1999


INTRODUCTION

     Packed or compressed targets are usually executables (*.exe) or dynamic link libraries (*.dll) whose opcodes have been compressed or packed by means of byte manipulation and algorithm to get a smaller sized file. These files, when disassembled using Windows Disassembler (W32Dasm) or Interactive Disassembler (IDA), your results will either be nothing, or only garbage with a small bit of disassembled opcodes. There are unpackers for certain packers in the web, but sometimes they don't work because the programmers found ways to defeat them, therefore, the need for manual unpacking.

     Before I continue any further, I must thank Iczelion for helping me unpack my first executable. He explained to me a lot about the PE and import tables and gave me directions and steps to unpack a file. THANKS A LOT!! ;-)


TARGET

     The target I'll was working on is Powerstrip v2.35.01, available from Entech Taiwan, packed using ASPack, also from the same company. But, any target, as long as it is packed with it can be used, although the decryption routines might differ slightly.


TOOLS USED

     All the above tools are available from Iczelion's Site or mine, but the ones in mine isn't very updated. I'll be assuming that you know how to use your tools and have already set them up correctly.


ESSAY

     As we all know, packers compress the executables and dynamic link library files to get smaller sized ones and also to protect the target from casual crackers. You might find the place you want to patch in memory (using Softice of course), but since the target is packed, you can't find the location. Now, there's a catch about this compressed exes and dlls -- they can *NOT* run in it's compressed state. It *MUST* be unpacked in memory *FIRST* before jumping to the real instructions. Using this knowledge, we'll be able to work our way through this.

     I will not explain how to get to the main unpacking routine. Using softice and IDA, you *should* be able to know where the routine is. If you can't, then I suggest to gain more experience first. Here is the snippet of the main routine. I copied this from IDA and edited it a little so that it looks the same as what you can see in softice.

005B0000 60		      pusha	 
005B0001 E8 00 00 00 00	      call	 005B0006 ; calling the next opcode
005B0006 5D		      pop	 ebp ; getting current address
005B0007 81 ED AE 98 43 00    sub	 ebp, 004398AE ; [does some calculation
005B000D B8 A8 98 43 00	      mov	 eax, 004398A8 ; to get ImageBase
005B0012 03 C5		      add	 eax, ebp      ; of exe at eax]
005B0014 2B 85 12 9D 43 00    sub	 eax, [00439D12+ebp] ; image base (00400000)
005B001A 89 85 1E 9D 43 00    mov	 [00439D1E+ebp], eax ; save image base
005B0020 80 BD 08 9D 43 00 00 cmp	 byte ptr [00439D08+ebp], 0 ; compare byte with 0
005B0027 75 15		      jnz	 005B003E ; jump over magic calls if not 0
005B0029 FE 85 08 9D 43 00    inc	 byte ptr [00439D08+ebp] ; increase value at location
005B002F E8 1D 00 00 00	      call	 005B0051 ; Magic Call 1
005B0034 E8 73 02 00 00	      call	 005B02AC ; Magic Call 2
005B0039 E8 0A 03 00 00	      call	 005B0348 ; Magic Call 3
005B003E 8B 85 0A 9D 43 00    mov	 eax, [00439D0A+ebp]
005B0044 03 85 1E 9D 43 00    add	 eax, [00439D1E+ebp] ; get real AddressOfEntryPoint
005B004A 89 44 24 1C	      mov	 [esp+1C], eax ; save it
005B004E 61		      popa	 
005B004F FF E0		      jmp	 eax ; jump to real entry point

     With my comments above, any dummy can guess that the three Magic Calls are the hearts of this unpacking routine. So, lets look into the first Magic Call.

005B0051 80 BD 2F 9E 43 00 00 cmp	 byte ptr [00439E2F+ebp], 0
005B0058 74 1D		      jz	 005B0077 ; looking in Softice, this jump is taken
...
005B0077 8D B5 26 9D 43 00    lea	 esi, [00439D26+ebp]
005B007D 83 3E 00	      cmp        [esi], 0 ; checks for compressed sections
005B0080 0F 84 EE 00 00 00    jz	 005B0174 ; jump if there isn't any more
005B0086 8D 85 86 9D 43 00    lea	 eax, [00439D86+ebp] ; get offset of "kernel32.dll"
005B008C 50		      push       eax 
005B008D FF 95 74 9E 43 00    call       [00439E74+ebp] ; call GetModuleHandle
005B0093 8B F8		      mov	 edi, eax ; move handle of kernel32.dll to edi
005B0095 8D 9D 93 9D 43 00    lea	 ebx, [00439D93+ebp] ; get offset of "VirtualAlloc"
005B009B 53		      push       ebx
005B009C 50		      push       eax
005B009D FF 95 70 9E 43 00    call       [00439E70+ebp] ; call GetProcAddress
005B00A3 89 85 76 9D 43 00    mov	 [00439D76+ebp], eax ; save address
005B00A9 8D 9D A0 9D 43 00    lea	 ebx, [00439DA0+ebp] ; get offset of "VirtualFree"
005B00AF 53		      push       ebx
005B00B0 57		      push       edi
005B00B1 FF 95 70 9E 43 00    call       [00439E70+ebp] ; call GetProcAddress
005B00B7 89 85 7A 9D 43 00    mov	 [00439D7A+ebp], eax ; save address
005B00BD 8D B5 26 9D 43 00    lea	 esi, [00439D26+ebp]
005B00C3 8B 46 04             mov	 eax, [esi+4] ; get VirtualSize of section
005B00C6 6A 04		      push       4
005B00C8 68 00 10 00 00	      push       1000
005B00CD 50		      push       eax
005B00CE 6A 00		      push       0
005B00D0 FF 95 76 9D 43 00    call       [00439D76+ebp] ; call VirtualAlloc
005B00D6 89 85 22 9D 43 00    mov	 [00439D22+ebp], eax
005B00DC 56		      push       esi
005B00DD 8B 1E		      mov        ebx, [esi]
005B00DF 03 9D 1E 9D 43 00    add	 ebx, [00439D1E+ebp]
005B00E5 50		      push       eax
005B00E6 53		      push       ebx
005B00E7 E8 89 00 00 00	      call       005B0175 ; [unpacks section to space allocated
005B00EC 83 C4 08             add        esp, 8   ; by VirtualAlloc]
005B00EF 3B 46 04	      cmp	 eax, [esi+4] ; is size of unpacked data correct?
005B00F2 74 0B		      jz	 005B00FF ; jump if it is
...
005B00FF 80 BD 09 9D 43 00 00 cmp	 [00439D09+ebp], 0
005B0106 75 39		      jnz	 005B0141 ; jump not taken when unpacking 1st section
005B0108 FE 85 09 9D 43 00    inc	 byte ptr [00439D09+ebp]
005B010E 50		      push       eax
005B010F 51		      push       ecx
005B0110 56		      push       esi
005B0111 53		      push       ebx
005B0112 8B C8		      mov	 ecx, eax
005B0114 83 E9 05	      sub	 ecx, 5
005B0117 8B B5 22 9D 43 00    mov	 esi, [00439D22+ebp]
005B011D 33 DB		      xor	 ebx, ebx
005B011F		      or	 ecx, ecx ; [line 005B011F to 005B013B
005B0121 74 1A		      jz	 005B013D ; does more calculation
005B0123 AC		      lodsb               ; on to the first section which
005B0124 3C E8		      cmp        al, E8   ; is being unpacked]
005B0126 74 08		      jz	 005B0130
005B0128 3C E9		      cmp	 al, E9
005B012A 74 04		      jz	 005B0130
005B012C 43		      inc	 ebx
005B012D 49		      dec	 ecx
005B012E EB EF		      jmp        005B011F
005B0130 29 1E                sub	 [esi], ebx
005B0132 83 C3 05	      add	 ebx, 5
005B0135 83 C6 04	      add	 esi, 4
005B0138 83 E9 05	      sub	 ecx, 5
005B013B EB E2		      jmp	 005B011F
005B013D 5B		      pop        ebx
005B013E 5E		      pop	 esi
005B013F 59		      pop	 ecx
005B0140 58		      pop	 eax
005B0141 8B C8		      mov        ecx, eax
005B0143 8B 3E		      mov	 edi, [esi]
005B0145 03 BD 1E 9D 43 00    add	 edi, [00439D1E+ebp]
005B014B 8B B5 22 9D 43 00    mov	 esi, [00439D22+ebp]
005B0151 F3 A4		      repe movsb ; copies unpacked data to exe's address space
005B0153 5E		      pop	 esi
005B0154 8B 85 22 9D 43 00    mov	 eax, [00439D22+ebp]
005B015A 68 00 80 00 00	      push       8000
005B015F 6A 00		      push       0
005B0161 50		      push       eax
005B0162 FF 95 7A 9D 43 00    call       [00439D7A+ebp] ; call VirtualFree
005B0168 83 C6 08	      add	 esi, 8
005B016B 83 3E 00	      cmp	 [esi], 0 ; anymore sections to unpack?
005B016E 0F 85 4F FF FF FF    jnz	 005B00C3 ; jump if there is
005B0174 C3		      retn    

     Long, but straightforward piece of code. First, the addresses for functions VirtualAlloc and VirtualFree is retrieved from the GetProcAddress function. Then, the program allocates a portion of memory from a specific area in its own address space by a call to VirtualAlloc, then the packed section is unpacked into this address space. If this packed section is the first section, then more calculation is done onto it. Later, the unpacked data is moved into the exe's actual address space, and then the new address space is freed by a call to VirtualFree. This whole routine loops as many as 4 times for the 4 packed sections, which are CODE, DATA, .idata and .rsrc if you're using PowerStrip as the target. The names of the sections are obtained by looking at Softice's code window 'title bar' when the unpacked sections are moved there at line 005B0151.

     Next, lets look at the second Magic Call.

005B02AC 8B 95 1E 9D 43 00    mov	 edx, [00439D1E+ebp] ; current image base address
005B02B2 8B 85 0E 9D 43 00    mov	 eax, [00439D0E+ebp] ; image base of exe before packing
005B02B8 2B D0		      sub	 edx, eax ; compares them
005B02BA 74 75		      jz	 005B0331 ; jump if equal
...
005B0331 C3                   retn

     This second Magic Call puzzled me for a moment. The image base specified in the PE header is only the PREFERRED load address. That is, the PE loader can load the file elsewhere if necessary. This call checks if the current image base is the same as the prefered one. If it isn't, my guess is that it will relocate them. But this would normally happen to a dll.

005B0348 8B 95 1E 9D 43 00    mov	 edx, [00439D1E+ebp] ; Image base
005B034E 8B B5 27 9E 43 00    mov	 esi, [00439E27+ebp] 
005B0354 8B BD 23 9E 43 00    mov	 edi, [00439E23+ebp]
005B035A 03 F2		      add	 esi, edx
005B035C 03 FA		      add	 edi, edx
005B035E 8B 46 0C	      mov	 eax, [esi+0C]
005B0361 85 C0		      test       eax, eax
005B0363 0F 84 F6 00 00 00    jz	 005B045F
005B0369 03 C2		      add	 eax, edx ; get offset of dll name
005B036B 8B D8		      mov	 ebx, eax
005B036D 50		      push       eax
005B036E FF 95 74 9E 43 00    call       [00439E7E+ebp] ; call GetModuleHandle
005B0374 85 C0		      test       eax, eax ; test if call succeeded
005B0376 75 67		      jnz	 005B03DF ; jump if yes
...
005B03DF 89 85 1F 9E 43 00    mov	 [00439E1F+ebp], eax ; save handle
005B03E5 C7 85 2B 9E 43 00 00+mov	 [00439E2B+ebp], 0
005B03EF 8B 95 1E 9D 43 00    mov	 edx, [00439D1E+ebp]
005B03F5 8B 06		      mov	 eax, [esi]
005B03F7 85 C0		      test       eax, eax
005B03F9 75 03		      jnz	 005B03FE
005B03FB 8B 46 10	      mov	 eax, [esi+10]
005B03FE 03 C2		      add	 eax, edx
005B0400 03 85 2B 9E 43 00    add	 eax, [00439E2B+ebp]
005B0406 8B 18		      mov	 ebx, [eax]
005B0408 8B 7E 10	      mov	 edi, [esi+10h]
005B040B 03 FA		      add	 edi, edx
005B040D 03 BD 2B 9E 43 00    add	 edi, [00439E2B+ebp] ; points to Import Table
005B0413 85 DB		      test       ebx, ebx
005B0415 74 3A		      jz	 005B0451
005B0417 F7 C3 00 00 00 80    test       ebx, 80000000h
005B041D 75 04		      jnz	 005B0423
005B041F 03 DA		      add	 ebx, edx
005B0421 43		      inc	 ebx
005B0422 43		      inc	 ebx
005B0423 81 E3 FF FF FF 7F    and	 ebx, 7FFFFFFFh
005B0429 53		      push       ebx ; points to dll function name
005B042A FF B5 1F 9E 43 00    push       [00439E1F+ebp]
005B0430 FF 95 70 9E 43 00    call       [00439E70+ebp] ; call GetProcAddress
005B0436 85 C0		      test       eax, eax ; did call succeed?
005B0438 75 0C		      jnz	 005B0446 ; jump if yes
...
005B0446 89 07		      mov	 [edi], eax ; save address of function to Import Table
005B0448 83 85 2B 9E 43 00 04 add	 [00439E2B+ebp], 4
005B044F EB 9E		      jmp	 005B03EF
005B0451 83 C6 14	      add	 esi, 14h
005B0454 8B 95 1E 9D 43 00    mov	 edx, [00439D1E+ebp]
005B045A E9 FF FE FF FF	      jmp	 005B035E
005B045F C3		      retn    

     From here, we can see that this third Magic Call is actually acting the work of the dll loader. And from the definition of the Import Table below, we can conclude that we will only dump the sections resulting from the first Magic Call. Now, lets concentrate on what we need to know before dumping the sections, that is, the PE file.

     We will first learn about section headers. Load up Procdump and using the PE Editor function, view the sections of your target. You should see something like this. I'm using Pstrip.exe as an example.

Name     Virtual Size  Virtual Offset  Raw Size  Raw Offset  Characteristics
----------------------------------------------------------------------------
CODE     000CB000      00001000        0004B200  00000400    C0000040
DATA     00002000      000CC000        00001A00  0004B600    C0000040
BSS      00005000      000CE000        00000000  0004D000    C0000040
.idata   00003000      000D3000        00001000  0004D000    C0000040
.tls     00001000      000D6000        00000000  0004E000    C0000040
.rdata   00001000      000D7000        00000200  0004E000    C0000040
.reloc   0000D000      000D8000        00000000  0004E200    C0000040
.rsrc    000CB000      000E5000        00024000  0004E200    C0000040
.data    00002000      001B0000        00001400  00072000    C0000040

     As we can see here, there're 9 sections here. Every section consists of a header and a body (the raw data in the exe). The section table is 40 bytes long, and it's defined as follows in the WINNT.H file.

#define IMAGE_SIZEOF_SHORT_NAME 8

typedef struct _IMAGE_SECTION_HEADER {
     UCHAR   Name[IMAGE_SIZEOF_SHORT_NAME];
     union {
             ULONG   PhysicalAddress;
             ULONG   VirtualSize;
     } Misc;
     ULONG   VirtualAddress;
     ULONG   SizeOfRawData;
     ULONG   PointerToRawData;
     ULONG   PointerToRelocations;
     ULONG   PointerToLineNumbers;
     USHORT  NumberOfRelocations;
     USHORT  NumberOfLinenumbers;
     ULONG   Characteristics;
} IMAGE_SECTION_HEADER, *PIMAGE_SECTION_HEADER;

     I will only touch on the important ones that we need to know for our dumping work. Name is a field of 8 bytes storing the name of our sections, for example CODE, DATA, .idata and .rsrc. VirtualSize is the size of that section when the exe is mapped into memory on loading. VirtualAddress (for short, VA) is the starting address of that section when mapped into memory. SizeOfRawData is the physical size of that section in the exe file. PointerToRawData is the offset of that section in the exe file.

     Before we continue, lets create an outline of what we should do for our dumping and exe creating process.

  1. Gather vital information of our exe for our dumping process and also to enable our exe to run after it is created. The information are the new entry point for our exe (already explained), names of the sections we want to dump (already explained), Virtual Sizes and Virtual Addresses of the sections we want to dump (using Procdump), Raw Offsets and Raw Sizes EVERY section in the exe (using Procdump too) and the new address of our import table.
  2. Start dumping the sections, using the Virtual Sizes and Relative Virtual Addresses for each section. Relative Virtual Address is the sum of Image Base (400000 for Pstrip.exe) and VA.
  3. Glue the sections together using a good hexeditor which allows hex copying and pasting, starting from the PE header, followed by the sections in correct order. Ultraedit provides such function.
  4. Modify the PE header to reflect the new entry point, import table, raw offsets and raw sizes of the new exe. This can be easily done using Procdump, or you may want to read up more on the PE header and modify them yourself with a hexeditor.

     Now, you can start gathering the information we want for our dumping process. You will only need to know which sections to dump (we have done this earlier, that is CODE, DATA, .idata and .rsrc), the Relative Virtual Address (for short, RVA) of the sections (that is Image Base + VA) and Virtual Size of the sections. Using Procdump, we can easily determine these values we need. So, start dumping the sections into files using The Owl's fantastic tool, Icedump (or if you prefer, Quine's Softdump) into different files. The size of each sections will be the Virtual Size of it, and the RVA for the sections will be Image Base (which is 400000) + VA.

     Next, we shall 'glue' these sections together using Ultraedit's fantastic copy & paste feature. But, before that, we need to know some more things. We'll need the Raw Size and Raw Offset of EVERY section. With this information, we can find out where the unpacked sections be found in the packed exe, and what are their sizes. These information can be found using Procdump too.

     Now, start with copying the PE header of the packed exe to a new file until the Raw Offset of the first section, which is 400h for Pstrip.exe, CODE section. Next, copy the dumped CODE section into this newly created file, starting from offset 400h. Take note of the new raw size of this section, and the new raw offset for the next section, which is DATA. Do the same for every other sections. Remember to take note of the new raw sizes and raw offsets for every sections, then replace the old one using Procdump. THIS STEP IS VERY IMPORTANT.

     Next, modify the PE header to reflect the new entry point and import table. The problem now is, we do not know where the import table is. Fortunately, the solution to this is very simple. But, we need to know how the import directory is structured. Please take note that the value of the Import Table entry is actually pointing to the VA structure of the Import directory. Here's the structure of the import directory, as given by Randy Kath from the Microsoft Developer Network Technology Group.

typedef struct tagImportDirectory
     {
     DWORD    dwRVAFunctionNameList;
     DWORD    dwUseless1;
     DWORD    dwUseless2;
     DWORD    dwRVAModuleName;
     DWORD    dwRVAFunctionAddressList;
     }IMAGE_IMPORT_MODULE_DIRECTORY,
      * PIMAGE_IMPORT_MODULE_DIRECTORY;

     I will not explain what the fields mean, as it is very obvious from their names. Here, there's only two fields which play an important role in determining the functions' addresses, which are the last two fields, dwRVAModuleName and dwRVAFunctionAddressList. dwRVAModuleName points to the name of the dll to load, for example, kernel32.dll. The dwRVAFunctionAddressList field points to an array of function addresses, and these function addresses points to the names of the function found in the dll. Well, not names actually, but to the ordinals of the functions, which is the 2 preceding bytes before the name of the functions. If there are a few dlls being used, then you'll have as many tagImportDirectory structures. As I mentioned before, the value of the import table should point to the first structure. But where's the structure?? With the information I've given above, I'm sure you'll be able to think of a way to find it out yourself ;-).

     After making these modifications, save your newly created exe, keep your fingers crossed, pray hard, and execute it. ;-) If you have done the modifications to the PE file correctly, you should be able to run the exe and when disassembling, the disassembler should be able to locate the function names too.


FINAL NOTES

     In my opinion, the skill of manual unpacking should be mastered by every cracker, instead of depending on unpackers, both generic and specific, to unpack exes or dlls. This is because there are a lot of strains of packers of the same version and this can fool the unpacker very easily. I actually got inspired by Marigold's 'virginity restoration' method when reading his essays on VBox protection schemes.

     Also, this method can be generally used for encryptors too, not limited only to packers.


DEFINITIONS

ImageBase. Preferred base address in the address space of a process to map the executable image to.

AddressOfEntryPoint. Indicates the location of the entry point for the application.

VirtualSize. Indicates the size of the section when mapped to memory.

Import Table. A table of addresses which points to dll function names. This table is replaced by addresses of the functions before loading of the exe at entry point.

Sections. Sections contain the content of the file, including code, data, resources, and other executable information.